184 research outputs found
Mapping the complexity of transcription control in higher eukaryotes
Recent large analyses suggest the importance of combinatorial regulation by broadly expressed transcription factors rather than expression domains characterized by highly specific factors
Strategies for Identifying RNA Splicing Regulatory Motifs and Predicting Alternative Splicing Events
Spatial preferences of microRNA targets in 3' untranslated regions
<p>Abstract</p> <p>Background</p> <p>MicroRNAs are an important class of regulatory RNAs which repress animal genes by preferentially interacting with complementary sequence motifs in the 3' untranslated region (UTR) of target mRNAs. Computational methods have been developed which can successfully predict which microRNA may target which mRNA on a genome-wide scale.</p> <p>Results</p> <p>We address how predicted target sites may be affected by alternative polyadenylation events changing the 3'UTR sequence. We find that two thirds of targeted genes have alternative 3'UTRs, with 40% of predicted target sites located in alternative UTR segments. We propose three classes based on whether the target sites fall within constitutive and/or alternative UTR segments, and examine the spatial distribution of predicted targets in alternative UTRs. In particular, there is a strong preference for targets to be located in close vicinity of the stop codon and the polyadenylation sites.</p> <p>Conclusion</p> <p>The transcript diversity seen in non-coding regions, as well as the relative location of miRNA target sites defined by it, has a potentially large impact on gene regulation by miRNAs and should be taken into account when defining, predicting or validating miRNA targets.</p
Phylogenetic simulation of promoter evolution: estimation and modeling of binding site turnover events and assessment of their impact on alignment tools
Phylogenetic simulation of promoter evolution were used to analyze functional site turnover in regulatory sequences
Deep learning for prediction of population health costs
Accurate prediction of healthcare costs is important for optimally managing
health costs. However, methods leveraging the medical richness from data such
as health insurance claims or electronic health records are missing. Here, we
developed a deep neural network to predict future cost from health insurance
claims records. We applied the deep network and a ridge regression model to a
sample of 1.4 million German insurants to predict total one-year health care
costs. Both methods were compared to Morbi-RSA models with various performance
measures and were also used to predict patients with a change in costs and to
identify relevant codes for this prediction. We showed that the neural network
outperformed the ridge regression as well as all Morbi-RSA models for cost
prediction. Further, the neural network was superior to ridge regression in
predicting patients with cost change and identified more specific codes. In
summary, we showed that our deep neural network can leverage the full
complexity of the patient records and outperforms standard approaches. We
suggest that the better performance is due to the ability to incorporate
complex interactions in the model and that the model might also be used for
predicting other health phenotypes
ssHMM: extracting intuitive sequence-structure motifs from high-throughput RNA-binding protein data
RNA-binding proteins (RBPs) play an important role in RNA post-transcriptional
regulation and recognize target RNAs via sequence-structure motifs. The extent
to which RNA structure influences protein binding in the presence or absence
of a sequence motif is still poorly understood. Existing RNA motif finders
either take the structure of the RNA only partially into account, or employ
models which are not directly interpretable as sequence-structure motifs. We
developed ssHMM, an RNA motif finder based on a hidden Markov model (HMM) and
Gibbs sampling which fully captures the relationship between RNA sequence and
secondary structure preference of a given RBP. Compared to previous methods
which output separate logos for sequence and structure, it directly produces a
combined sequence-structure motif when trained on a large set of sequences.
ssHMM’s model is visualized intuitively as a graph and facilitates biological
interpretation. ssHMM can be used to find novel bona fide sequence-structure
motifs of uncharacterized RBPs, such as the one presented here for the YY1
protein. ssHMM reaches a high motif recovery rate on synthetic data, it
recovers known RBP motifs from CLIP-Seq data, and scales linearly on the input
size, being considerably faster than MEMERIS and RNAcontext on large datasets
while being on par with GraphProt. It is freely available on Github and as a
Docker image
Genome-wide search for miRNA-target interactions in Arabidopsis thaliana with an integrated approach
Orthologous Transcription Factors in Bacteria Have Different Functions and Regulate Different Genes
Transcription factors (TFs) form large paralogous gene families and have complex evolutionary histories. Here, we ask whether putative orthologs of TFs, from bidirectional best BLAST hits (BBHs), are evolutionary orthologs with conserved functions. We show that BBHs of TFs from distantly related bacteria are usually not evolutionary orthologs. Furthermore, the false orthologs usually respond to different signals and regulate distinct pathways, while the few BBHs that are evolutionary orthologs do have conserved functions. To test the conservation of regulatory interactions, we analyze expression patterns. We find that regulatory relationships between TFs and their regulated genes are usually not conserved for BBHs in Escherichia coli K12 and Bacillus subtilis. Even in the much more closely related bacteria Vibrio cholerae and Shewanella oneidensis MR-1, predicting regulation from E. coli BBHs has high error rates. Using gene–regulon correlations, we identify genes whose expression pattern differs between E. coli and S. oneidensis. Using literature searches and sequence analysis, we show that these changes in expression patterns reflect changes in gene regulation, even for evolutionary orthologs. We conclude that the evolution of bacterial regulation should be analyzed with phylogenetic trees, rather than BBHs, and that bacterial regulatory networks evolve more rapidly than previously thought
Improving the Caenorhabditis elegans Genome Annotation Using Machine Learning
For modern biology, precise genome annotations are of prime importance, as they allow the accurate definition of genic regions. We employ state-of-the-art machine learning methods to assay and improve the accuracy of the genome annotation of the nematode Caenorhabditis elegans. The proposed machine learning system is trained to recognize exons and introns on the unspliced mRNA, utilizing recent advances in support vector machines and label sequence learning. In 87% (coding and untranslated regions) and 95% (coding regions only) of all genes tested in several out-of-sample evaluations, our method correctly identified all exons and introns. Notably, only 37% and 50%, respectively, of the presently unconfirmed genes in the C. elegans genome annotation agree with our predictions, thus we hypothesize that a sizable fraction of those genes are not correctly annotated. A retrospective evaluation of the Wormbase WS120 annotation [1] of C. elegans reveals that splice form predictions on unconfirmed genes in WS120 are inaccurate in about 18% of the considered cases, while our predictions deviate from the truth only in 10%–13%. We experimentally analyzed 20 controversial genes on which our system and the annotation disagree, confirming the superiority of our predictions. While our method correctly predicted 75% of those cases, the standard annotation was never completely correct. The accuracy of our system is further corroborated by a comparison with two other recently proposed systems that can be used for splice form prediction: SNAP and ExonHunter. We conclude that the genome annotation of C. elegans and other organisms can be greatly enhanced using modern machine learning technology
Evidence-ranked motif identification
A new computational method for the identification of regulatory motifs from large genomic datasets is presented her
- …